---
sidebar_class_name: node-only
---

# Llama CPP

:::tip Compatibility
Only available on Node.js.
:::

This module is based on the [node-llama-cpp](https://github.com/withcatai/node-llama-cpp) Node.js bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp), allowing you to work with a locally running LLM. This allows you to work with a much smaller quantized model capable of running on a laptop environment, ideal for testing and scratch padding ideas without running up a bill!

## Setup

You'll need to install the [node-llama-cpp](https://github.com/withcatai/node-llama-cpp) module to communicate with your local model.

import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

```bash npm2yarn
npm install -S node-llama-cpp @langchain/community
```

You will also need a local Llama 2 model (or a model supported by [node-llama-cpp](https://github.com/withcatai/node-llama-cpp)). You will need to pass the path to this model to the LlamaCpp module as a part of the parameters (see example).

Out-of-the-box `node-llama-cpp` is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at [node-llama-cpp](https://withcatai.github.io/node-llama-cpp/).

For advice on getting and preparing `llama2` see the documentation for the LLM version of this module.

A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable `LLAMA_PATH`.

## Usage

### Basic use

In this case we pass in a prompt wrapped as a message and expect a response.

import CodeBlock from "@theme/CodeBlock";
import BasicExample from "@examples/models/chat/integration_llama_cpp.ts";

<CodeBlock language="typescript">{BasicExample}</CodeBlock>

### System messages

We can also provide a system message, note that with the `llama_cpp` module a system message will cause the creation of a new session.

import SystemExample from "@examples/models/chat/integration_llama_cpp_system.ts";

<CodeBlock language="typescript">{SystemExample}</CodeBlock>

### Chains

This module can also be used with chains, note that using more complex chains will require suitably powerful version of `llama2` such as the 70B version.

import ChainExample from "@examples/models/chat/integration_llama_cpp_chain.ts";

<CodeBlock language="typescript">{ChainExample}</CodeBlock>

### Streaming

We can also stream with Llama CPP, this can be using a raw 'single prompt' string:

import StreamExample from "@examples/models/chat/integration_llama_cpp_stream.ts";

<CodeBlock language="typescript">{StreamExample}</CodeBlock>

Or you can provide multiple messages, note that this takes the input and then submits a Llama2 formatted prompt to the model.

import StreamMultiExample from "@examples/models/chat/integration_llama_cpp_stream_multi.ts";

<CodeBlock language="typescript">{StreamMultiExample}</CodeBlock>
